23 research outputs found

    Effect of Resting-State fNIRS Scanning Duration on Functional Brain Connectivity and Graph Theory Metrics of Brain Network

    Get PDF
    As an emerging brain imaging technique, functional near infrared spectroscopy (fNIRS) has attracted widespread attention for advancing resting-state functional connectivity (FC) and graph theoretical analyses of brain networks. However, it remains largely unknown how the duration of the fNIRS signal scanning is related to stable and reproducible functional brain network features. To answer this question, we collected resting-state fNIRS signals (10-min duration, two runs) from 18 participants and then truncated the hemodynamic time series into 30-s time bins that ranged from 1 to 10 min. Measures of nodal efficiency, nodal betweenness, network local efficiency, global efficiency, and clustering coefficient were computed for each subject at each fNIRS signal acquisition duration. Analyses of the stability and between-run reproducibility were performed to identify optimal time length for each measure. We found that the FC, nodal efficiency and nodal betweenness stabilized and were reproducible after 1 min of fNIRS signal acquisition, whereas network clustering coefficient, local and global efficiencies stabilized after 1 min and were reproducible after 5 min of fNIRS signal acquisition for only local and global efficiencies. These quantitative results provide direct evidence regarding the choice of the resting-state fNIRS scanning duration for functional brain connectivity and topological metric stability of brain network connectivity

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Full text link
    Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27.78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Full text link
    Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is proposed in this paper. The efficacy of the video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end and Conformer ASR back-end. Audio-visual integrated front-end architectures performing speech separation and dereverberation in a pipelined or joint fashion via mask-based WPD are investigated. The error cost mismatch between the speech enhancement front-end and ASR back-end components is minimized by end-to-end jointly fine-tuning using either the ASR cost function alone, or its interpolation with the speech enhancement loss. Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset. The proposed audio-visual multi-channel speech separation, dereverberation and recognition systems consistently outperformed the comparable audio-only baseline by 9.1% and 6.2% absolute (41.7% and 36.0% relative) word error rate (WER) reductions. Consistent speech enhancement improvements were also obtained on PESQ, STOI and SRMR scores.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

    Full text link
    Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained models into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition: a) input feature fusion between standard acoustic frontends and domain adapted wav2vec2.0 speech representations; b) frame-level joint decoding of TDNN systems separately trained using standard acoustic features alone and with additional wav2vec2.0 features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain adapted wav2vec2.0 models. In addition, domain adapted wav2vec2.0 representations are utilized in acoustic-to-articulatory (A2A) inversion to construct multi-modal dysarthric and elderly speech recognition systems. Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2.0 models consistently outperform the standalone wav2vec2.0 models by statistically significant WER reductions of 8.22% and 3.43% absolute (26.71% and 15.88% relative) on the two tasks respectively. The lowest published WERs of 22.56% (52.53% on very low intelligibility, 39.09% on unseen words) and 18.17% are obtained on the UASpeech test set of 16 dysarthric speakers, and the DementiaBank Pitt test set respectively.Comment: accepted by ICASSP 202

    Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

    Full text link
    Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.Comment: It' s accepted to ISCA 202

    Intersecting distributed networks support convergent linguistic functioning across different languages in bilinguals

    Get PDF
    How bilingual brains accomplish the processing of more than one language has been widely investigated by neuroimaging studies. The assimilation-accommodation hypothesis holds that both the same brain neural networks supporting the native language and additional new neural networks are utilized to implement second language processing. However, whether and how this hypothesis applies at the finer-grained levels of both brain anatomical organization and linguistic functions remains unknown. To address this issue, we scanned Chinese-English bilinguals during an implicit reading task involving Chinese words, English words and Chinese pinyin. We observed broad brain cortical regions wherein interdigitated distributed neural populations supported the same cognitive components of different languages. Although spatially separate, regions including the opercular and triangular parts of the inferior frontal gyrus, temporal pole, superior and middle temporal gyrus, precentral gyrus and supplementary motor areas were found to perform the same linguistic functions across languages, indicating regional-level functional assimilation supported by voxel-wise anatomical accommodation. Taken together, the findings not only verify the functional independence of neural representations of different languages, but show co-representation organization of both languages in most language regions, revealing linguistic-feature specific accommodation and assimilation between first and second languages

    Who can help me? Understanding the antecedent and consequence of medical information seeking behavior in the era of bigdata

    Get PDF
    IntroductionThe advent of bigdata era fundamentally transformed the nature of medical information seeking and the traditional binary medical relationship. Weaving stress coping theory and information processing theory, we developed an integrative perspective on information seeking behavior and explored the antecedent and consequence of such behavior.MethodsData were collected from 573 women suffering from infertility who was seeking assisted reproductive technology treatment in China. We used AMOS 22.0 and the PROCESS macro in SPSS 25.0 software to test our model.ResultsOur findings demonstrated that patients’ satisfaction with information received from the physicians negatively predicted their behavior involvement in information seeking, such behavior positively related to their perceived information overload, and the latter negatively related to patient-physician relationship quality. Further findings showed that medical information seeking behavior and perceived information overload would serially mediate the impacts of satisfaction with information received from physicians on patient-physician relationship quality.DiscussionThis study extends knowledge of information seeking behavior by proposing an integrative model and expands the application of stress coping theory and information processing theory. Additionally, it provides valuable implications for patients, physicians and public health information service providers

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition

    Full text link
    Articulatory features are inherently invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition (ASR) systems designed for normal speech. Their practical application to atypical task domains such as elderly and disordered speech across languages is often limited by the difficulty in collecting such specialist data from target speakers. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio, visual and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training before being cross-domain and cross-lingual adapted to three datasets across two languages: the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora; and the English TORGO dysarthric speech data, to produce UTI based articulatory features. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline hybrid TDNN and Conformer based end-to-end systems constructed using acoustic features only by statistically significant word error rate or character error rate reductions up to 2.64%, 1.92% and 1.21% absolute (8.17%, 7.89% and 13.28% relative) after data augmentation and speaker adaptation were applied.Comment: arXiv admin note: text overlap with arXiv:2203.1027
    corecore